Learning Factored Representations for Partially Observable Markov Decision Processes
نویسنده
چکیده
The problem of reinforcement learning in a non-Markov environment is explored using a dynamic Bayesian network, where conditional independence assumptions between random variables are compactly represented by network parameters. The parameters are learned on-line, and approximations are used to perform inference and to compute the optimal value function. The relative effects of inference and value function approximations on the quality of the final policy are investigated, by learning to solve a moderately difficult driving task. The two value function approximations, linear and quadratic, were found to perform similarly, but the quadratic model was more sensitive to initialization. Both performed below the level of human performance on the task. The dynamic Bayesian network performed comparably to a model using a localist hidden state representation, while requiring exponentially fewer parameters.
منابع مشابه
Decision Making under Uncertainty: Operations Research Meets AI (Again)
Models for sequential decision making under uncertainty (e.g., Markov decision processes,or MDPs) have been studied in operations research for decades. The recent incorporation of ideas from many areas of AI, including planning, probabilistic modeling, machine learning, and knowledge representation) have made these models much more widely applicable. I briefly survey recent advances within AI i...
متن کاملExploiting locality of interaction in factored Dec-POMDPs
Decentralized partially observable Markov decision processes (Dec-POMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents in a factored representation. Factored Dec-POMDP representations have been proposed before, but o...
متن کاملReinforcement Learning for Factored
Reinforcement Learning for Factored Markov Decision Processes Brian Sallans Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2002 Learning to act optimally in a complex, dynamic and noisy environment is a hard problem. Various threads of research from reinforcement learning, animal conditioning, operations research, machine learning, statistics and optimal cont...
متن کاملEfficient Planning for Factored Infinite-Horizon DEC-POMDPs
Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other and the environment. This planning task arises in optimization of wireless networks, and other scenarios where communication between agents is r...
متن کاملProbabilistic Knowledge-Based Programs
We introduce Probabilistic Knowledge-Based Programs (PKBPs), a new, compact representation of policies for factored partially observable Markov decision processes. PKBPs use branching conditions such as if the probability of φ is larger than p, and many more. While similar in spirit to valuebased policies, PKBPs leverage the factored representation for more compactness. They also cope with more...
متن کاملModel-based Bayesian Reinforcement Learning in Partially Observable Domains
Bayesian reinforcement learning in partially observable domains is notoriously difficult, in part due to the unknown form of the beliefs and the optimal value function. We show that beliefs represented by mixtures of products of Dirichlet distributions are closed under belief updates for factored domains. Belief monitoring algorithms that use this mixture representation are proposed. We also sh...
متن کامل